31 research outputs found

    Query-guided End-to-End Person Search

    Full text link
    Person search has recently gained attention as the novel task of finding a person, provided as a cropped sample, from a gallery of non-cropped images, whereby several other people are also visible. We believe that i. person detection and re-identification should be pursued in a joint optimization framework and that ii. the person search should leverage the query image extensively (e.g. emphasizing unique query patterns). However, so far, no prior art realizes this. We introduce a novel query-guided end-to-end person search network (QEEPS) to address both aspects. We leverage a most recent joint detector and re-identification work, OIM [37]. We extend this with i. a query-guided Siamese squeeze-and-excitation network (QSSE-Net) that uses global context from both the query and gallery images, ii. a query-guided region proposal network (QRPN) to produce query-relevant proposals, and iii. a query-guided similarity subnetwork (QSimNet), to learn a query-guided reidentification score. QEEPS is the first end-to-end query-guided detection and re-id network. On both the most recent CUHK-SYSU [37] and PRW [46] datasets, we outperform the previous state-of-the-art by a large margin.Comment: Accepted as poster in CVPR 201

    Temperature Dependence of a Sub-wavelength Compact Graphene Plasmon-Slot Modulator

    Full text link
    We investigate a plasmonic electro-optic modulator with an extinction ratio exceeding 1 dB/um by engineering the optical mode to be in-plane with the graphene layer, and show how lowering the operating temperature enables steeper switching. We show how cooling Graphene enables steeping thus improving dynamic energy consumption. Further, we show that multi-layer Graphene integrated with a plasmonic slot waveguide allows for in-plane electric field components, and 3-dB device lengths as short as several hundred nanometers only. Compact modulators approaching electronic length-scales pave a way for ultra-dense photonic integrated circuits with smallest footprint

    Class interference regularization

    Get PDF
    Contrastive losses yield state-of-the-art performance for person re-identification, face verification and few shot learning. They have recently outperformed the cross-entropy loss on classification at the ImageNet scale and outperformed all self-supervision prior results by a large margin (SimCLR). Simple and effective regularization techniques such as label smoothing and self-distillation do not apply anymore, because they act on multinomial label distributions, adopted in cross-entropy losses, and not on tuple comparative terms, which characterize the contrastive losses. Here we propose a novel, simple and effective regularization technique, the Class Interference Regularization (CIR), which applies to cross-entropy losses but is especially effective on contrastive losses. CIR perturbs the output features by randomly moving them towards the average embeddings of the negative classes. To the best of our knowledge, CIR is the first regularization technique to act on the output features. In experimental evaluation, the combination of CIR and a plain Siamese-net with triplet loss yields best few-shot learning performance on the challenging tieredImageNet. CIR also improves the state-of-the-art technique in person re-identification on the Market-1501 dataset, based on triplet loss, and the state-of-the-art technique in person search on the CUHK-SYSU dataset, based on a cross-entropy loss. Finally, on the task of classification CIR performs on par with the popular label smoothing, as demonstrated for CIFAR-10 and -100

    Joint Detection and Tracking in Videos with Identification Features

    Full text link
    Recent works have shown that combining object detection and tracking tasks, in the case of video data, results in higher performance for both tasks, but they require a high frame-rate as a strict requirement for performance. This is assumption is often violated in real-world applications, when models run on embedded devices, often at only a few frames per second. Videos at low frame-rate suffer from large object displacements. Here re-identification features may support to match large-displaced object detections, but current joint detection and re-identification formulations degrade the detector performance, as these two are contrasting tasks. In the real-world application having separate detector and re-id models is often not feasible, as both the memory and runtime effectively double. Towards robust long-term tracking applicable to reduced-computational-power devices, we propose the first joint optimization of detection, tracking and re-identification features for videos. Notably, our joint optimization maintains the detector performance, a typical multi-task challenge. At inference time, we leverage detections for tracking (tracking-by-detection) when the objects are visible, detectable and slowly moving in the image. We leverage instead re-identification features to match objects which disappeared (e.g. due to occlusion) for several frames or were not tracked due to fast motion (or low-frame-rate videos). Our proposed method reaches the state-of-the-art on MOT, it ranks 1st in the UA-DETRAC'18 tracking challenge among online trackers, and 3rd overall.Comment: Accepted at Image and Vision Computing Journa

    Coherent Multi-Sentence Video Description with Variable Level of Detail

    Full text link
    Humans can easily describe what they see in a coherent way and at varying level of detail. However, existing approaches for automatic video description are mainly focused on single sentence generation and produce descriptions at a fixed level of detail. In this paper, we address both of these limitations: for a variable level of detail we produce coherent multi-sentence descriptions of complex videos. We follow a two-step approach where we first learn to predict a semantic representation (SR) from video and then generate natural language descriptions from the SR. To produce consistent multi-sentence descriptions, we model across-sentence consistency at the level of the SR by enforcing a consistent topic. We also contribute both to the visual recognition of objects proposing a hand-centric approach as well as to the robust generation of sentences using a word lattice. Human judges rate our multi-sentence descriptions as more readable, correct, and relevant than related work. To understand the difference between more detailed and shorter descriptions, we collect and analyze a video description corpus of three levels of detail.Comment: 10 page

    Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets

    Full text link
    In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedestrian interaction, are often combined with tracklets. In this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between positions and head orientations (vislets) thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. We additionally exploit the head orientations as a proxy for the visual attention, when modeling social interactions. MX-LSTM predicts future pedestrians location and head pose, increasing the standard capabilities of the current approaches on long-term trajectory forecasting. Compared to the state-of-the-art, our approach shows better performances on an extensive set of public benchmarks. MX-LSTM is particularly effective when people move slowly, i.e. the most challenging scenario for all other models. The proposed approach also allows for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065

    Substantial and sustained reduction in under-5 mortality, diarrhea, and pneumonia in Oshikhandass, Pakistan : Evidence from two longitudinal cohort studies 15 years apart

    Get PDF
    Funding Information: Study 1 was funded through the Applied Diarrheal Disease Research Program at Harvard Institute for International Development with a grant from USAID (Project 936–5952, Cooperative Agreement # DPE-5952-A-00-5073-00), and the Aga Khan Health Service, Northern Areas and Chitral, Pakistan. Study 2 was funded by the Pakistan US S&T Cooperative Agreement between the Pakistan Higher Education Commission (HEC) (No.4–421/PAK-US/HEC/2010/955, grant to the Karakoram International University) and US National Academies of Science (Grant Number PGA-P211012 from NAS to the Fogarty International Center). The funding bodies had no role in the design of the study, data collection, analysis, interpretation, or writing of the manuscript. Publisher Copyright: © 2020 The Author(s).Peer reviewedPublisher PD

    Capturing road users on a traffic route

    No full text
    Die Erfindung betrifft ein Verfahren (10) zum Erfassen von Verkehrsteilnehmern (12) auf einem Verkehrsweg (14) in einer Abbildung, umfassend: - Erzeugen (42) einer Vielzahl von Bereichsvorschlägen (18) für mögliche Objekte, die in der Abbildung (16) aufgezeichnet sind, durch Anwenden eines Bereichsvorschlaggenerators; - Bereitstellen von Objekterfassung (72) für alle Bereichsvorschläge (18), um den Verkehrsweg (14) und/oder die Verkehrsteilnehmer (12) durch Klassifizieren unter Berücksichtigung eines vordefinierten Vertrauensniveaus zu erfassen; - Ausgeben von Erfassungsdaten, die durch die Objekterfassung empfangen werden; und - Bereitstellen eines Filterns (48) für die Bereichsvorschläge (18) vor dem Schritt des Bereitstellens von Objekterfassung, wobei das Filtern basierend auf jeweiligen Filterdaten ausgeführt wird, die basierend auf einer Relevanz der Bereichsvorschläge (18) in Zusammenhang mit den Verkehrsteilnehmern (12) und/oder dem Verkehrsweg (14) geschätzt werden
    corecore